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Quantile regression for longitudinal data: unobserved 
heterogeneity and informative missingness 

Maria Francesca Marino* Nikos Tzavidis t Marco Alfo* 


Abstract 


Linear quantile regression models provide a detailed and robust picture of the (con¬ 
ditional) response distribution as function of a set of observed covariates. Longitudinal 
data analysis is an interesting field of application of such models although this kind of 
data represent a substantial challenge, due to dependence issues. Often, dependence be¬ 
tween measurements from the same units is modelled by considering sources of unobserved, 
individual-specific, heterogeneity. Quantile regression models have recently been extended 
to the analysis of longitudinal, continuous, responses, using time-const ant, see e.g. Geraci| 


andBottai (2007), or time-varying, see Farcomeni (2012), random effects. In some empirical 
applications, however, we observe both temporal shocks in the overall trend and individual- 
specific heterogeneity in model parameters. To accommodate such situations, we propose 
to define a general quantile regression model for longitudinal, continuous, responses where 
time-varying and time-constant random parameters with unspecific distribution are jointly 
taken into account. We further deal with the case of irretrievable, non ignorable, exit from 
the study (i.e. drop-out) and show how the proposed model can be interpreted in a pattern 
mixture perspective, where changes in the fixed effect vector are associated to mixture 
components describing the individual propensity to remain into the study. The proposed 
models are illustrated using a well known benchmark dataset on longitudinal dynamics of 
CD4 cells and a large scale simulation study. 


1 Introduction 

In longitudinal studies, measurements recorded on the same individual are likely associated, 
and the adopted statistical modelling tools need to account for such a dependence. A common 
framework is based on postulating a conditional model which is augmented by individual- 
specific sources of unobserved heterogeneity that are meant to describe the dependence in the 
data. 
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In the context of longitudinal studies, Geraci and Bottai (2007) have proposed a linear quan¬ 
tile model with random effects; this model has been extended, either from a structural or a 


computational perspective by Liu and Bottai (2009) and Geraci and Bottai (2014). When 
random parameters are time-varying, the assumption of a time-constant distribution may lead 


to severe bias, see eg Bartolucci and Farcomeni (2009) for a discussion. To solve this issue, 


Farcomeni (2012) has proposed a linear quantile model with time-varying random intercepts, 


see e.g Bartolucci et al. (2012) for a general treatment of the topic. In some circumstances, it 
can be reasonable to assume that both time-constant and time-varying sources of unobserved 
heterogeneity influence longitudinal observations. In these cases, the proposal by [Farcomeni 


(2012) can be extended to deal also with (discrete) time-constant random parameters, inflating 
the number of states and forcing the transition probability matrix to be (quasi) diagonal at 
the cost of increase the computational complexity. To overcome this issue, we define a linear 
quantile mixed hidden Markov model , where random parameters may be constant and/or vary¬ 
ing over time. The model specification is not constrained to random intercepts only and, thus, 
offers greater flexibility; modelling quantiles in place of the mean ensures robustness against 
the possible presence of outliers in the observed data; accounting for structured, although un¬ 
observed, random variation leads to reliable and efficient parameter estimates. 

A frequent feature of longitudinal studies is that some individuals may be unavailable at all 
the pre-determined time occasions. The presence of missing information rises a number of 
challenges because of the potential bias in the parameter estimates. We describe how the 
model approach we propose can be interpreted in a pattern mixture perspective, according to 


Roy (2003) and Roy and Daniels (2008). With this representation, the dependence between 


the observed responses and the missing data process is due to the presence of sources of unob¬ 
served heterogeneity shared by individuals with a similar propensity to drop-out. This leads 
to groups characterized by common departures from the homogeneous linear quantile hidden 
Markov model and, therefore, to a very general approach. 

The paper is structured as follows: in section [3j after discussing the proposal by [Farcomeni 


(2012), we introduce the linear quantile mixed hidden Markov model, where time-varying and 
time constant random parameters are jointly considered. In section [4j we show how the pro¬ 
posals by Roy (2003) and Roy and Daniels (2008) can be extended to the quantile context 
and to time-varying random parameter models. The resulting model can be interpreted as a 
particular specification of the general modelling framework we propose. The EM algorithm 
for parameter estimation is briefly sketched in section [5] Further details are available in the 
supplementary material. Results of the application to the CD4 dataset (Kaslow et al.[ 1987 


Zeger and Diggle, 1994) and of a large scale simulation study are given in sections [T] and [6j 


respectively. Further simulation-based evidence is included in the supplemntary material. Last 
section gives concluding remarks and outlines potential future research agenda. 
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2 Linear quantile hidden Markov model 


Let Y lf be a continuous response variable and x* f a set of covariates for unit i = 1 ,n recorded 
at time occasion t = 1 ,,T. Quantile regression extends regression analysis for the centre of 
a conditional distribution; even when this represents the focus of analysis, median regression 
offers an outlier-robust alternative to least squares. When analysing hierarchical (for example 
longitudinal) data, the dependence structure must be accounted for; a widely used approach is 


based on considering conditional models with random parameters; see Laird and Ware (1982) 
for a general discussion. 

In this framework, we rely on conditional independence given (individual-specific) latent char¬ 
acteristics (e.g. omitted covariates), and obtain marginal dependence due to measurements 
from the same individual sharing some common latent variables. Even when a (marginal) 
multivariate model does exist, the conditional approach may still be preferable, see |Lee and| 


Nelder (2004) 


Random parameters enter in the model as random intercepts and/or random slopes and may 
be either time-constant or time-varying. When there is no or limited prior knowledge about the 
possible causes of unobserved heterogeneity, the time-varying option offers greater flexibility. 
Farcomeni (2012) has proposed a linear quantile hidden Markov model ( IqHMM ) that can be 
described as follows. For a given r 6 (0,1), let {Su(t)} be a homogeneous, first order, hidden 
Markov chain with state space =5^(r) = {1,m(r)}; initial and transition probabilities are 
given by 4(r) = Pr (S it (r) = h) and q k h{j) = Pr(S' it (r) = h \ S it - i(r) = k), h, k = 1,... ,m(r), 
respectively. The IqHMM is defined by 


Q{yu | s it ,T) = x' 4 /3(r) + a s . t(T ), 


( 1 ) 


where Q(■ \ su,t ) denotes the (conditional) quantile for the i— th individual, i = 1 ,...,n, 
being in state su(t) at time t, and oc 8it r T ) denotes an individual-specific random intercept that 
evolves over time according to the hidden Markov chain described before. 

This modelling structure can be enriched by considering individual-specific random slopes to 
account for individual departures from the fixed parameters /3(r), thus relaxing the assumption 
of orthogonality between the observed covariates and the sources of unobserved heterogeneity. 


These random departures can be either time-constant, as in Geraci and Bottai (2014), or time- 


varying, as in a slighted extended version of Farcomeni (2012). In our perspective, a general 
model should account for both. 


3 Linear quantile mixed hidden Markov model 

To define a general model specification, we assume that dependence is due to different sources 
of unobserved heterogeneity, some of which are time-varying, while others are time-constant. 
Let bj(r) = (bn(r ),..., bi q (r)) be an individual-specific random parameter vector with density 
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/*>(• I D, r), where D is a quantile-dependent covariance matrix, and E(b,.(r)) = 0. For a given 
r, we define a linear quantile mixed hidden Markov model ( IqmHMM ) as follows 


Q(Vit | s it , bj, r) = x' i /3(r) + z it h j(r) + w' it a s .^ r) . 


(2) 


In the expression above, /3(r) denotes a p-dimensional vector of fixed parameters describing 
the effect of observed covariates on the T-th (conditional) response quantile, Zjj and Wjj are 
non-overlapping subsets of Xjj, while bj(r) and ct Sit (r) identify time-constant and time-varying 
random deviations from /3(r), respectively. All model parameters introduced so far are indexed 
by r; in what follows, we simplify the notation by dropping the r index. 

The Iqm.HMM is based on the following assumption. The random vector bj and the hidden 
Markov process {5^} are assumed to be independent as they are meant to capture differ¬ 
ent sources of unobserved heterogeneity. The distribution of the response variable is defined 
conditional on the hidden state occupied at time occasion t , i.e. su, and the (time-constant) 
individual-specific random vector bj; conditional on these parameters, longitudinal observation 
from the same individual are independent (local independence assumption) and the conditional 
distribution of the individual sequence is given by: 

T T 

fy\s,b(yi I S *) bj , l/} , t) = fy\s,bhjit \ Uil:t—l, Sjl:t, bj, 7") = fy\s,b(lJit \ Sit, bj, t), (3) 

t= 1 t= 1 


where = (/3, aq,..., a m ) is the vector of longitudinal model parameters. 

In expression ([3]), yn-.t-i denotes the response history for the i-th individual up to time t— 1 and 
Sji : t is the sequence of hidden states up to time t. It is worth to notice that the model in equation 
© reduces to the model by[Geraci and Bottai (2007) when a single state is considered (m = 1) 
and to the model by Farcomeni (2012) when wu = 1 and bi = 0, V?' = 1 ,... ,n,t = 1,. ..,Tj. 
In order to derive parameter estimates in a maximum likelihood perspective, we adopt an 


asymmetric Laplace distribution (Yu and Zhang, 2005) for the longitudinal responses, as in 
Geraci and Bottai||2007[ that is 


Y it | s it , bj ~ ALD {nit[s it , bj], a, r) , 


where the location parameter yu is defined by expression ©>■ Based on this assumption, 
expression © can be written as 


/j,|a,&(yi I Sj,bj, V>,r) = 


V(1 - r)' 

Ti ( Ti 

Hit 

a 

ex P S / v Pt 

l t =1 

a 


( 4 ) 


with p T (-) denoting the quantile asymmetric loss function (Koenker and Bassett, 1978). 

Let Q,D) be the vector of all model parameters; the observed data likelihood is 
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defined by 


L (* I y> T ) = n / {Y 


i= 1' 


Ti 


Y{fy\s,b(yit | S it ,hi,Ip, t) 


,t= 1 


/s (Sj | <5, Q, r) > f b (bi I D, r)dbj, (5) 


where, due to the Markov property, the marginal distribution for the hidden chain can be 
factorized as 

Ti Ti 

fs ( s i | S, Q, t) = f s (sn | S, t) [ f s ( Sit I Sit— 1) Q) t) = 5 Sil | Qsit-iSif 

t =2 t =2 


For a parametric specification of /&(■ | D,r), parameter estimation would require the solution 
of a multiple integral that does not have a closed form solution. The next section entails the 
choice of such a distribution and its effect on the likelihood approximation. 


3.1 Estimating the random parameter distribution 

When the random parameter distribution is parametrically specified, one may use a Monte 


Carlo EM algorithm, as in 

Geraci and Bottai 

(2007 

) and 

Liu and Bottai 

(2009 

), or a direct 

ML approach with Gaussian quadrature, as in 

Geraci and Bottai 

(2014 

. Both should be 


appropriately extended to deal with the hidden Markov chain. When a limited number of 
repeated measurements are available for each individual, the choice of the random parameter 
distribution can be crucial. To avoid potential misspecification, we propose to approximate 
it by using a discrete distribution on G(t ) < n support points b g with masses 7 x g = Pr(b s ), 
7 T g > 0,^ fl 7r 9 = 1 ,g = 1,..., G. This is known as the nonparametric maximum likelihood 


(NPML) estimate of the mixing distribution f b (- | D,r), see Aitkin (1999), Bohning (1982) 


and Lindsay (1983a b); locations and masses are treated as unknown parameters and estimated 
through the observed data. Let us introduce a discrete latent variable Ci = (Ca> ■ ■ ■, C*g)> to 
represent component membership. That is, Q g = 1, g = 1,..., G, if the z-th individual belongs 
to the g-th component of the mixture, zero otherwise. As before, and G depend on the 
chosen quantile r, but we will suppress this indexing to simplify the notation. Denoting by ip = 
(/3, bi, ..., be, Qi,..., a m ) the set of longitudinal model parameters and by 7r = (tti ,..., 7 tq) 
the vector of mixture component probabilities, the observed data likelihood in equation ([5]) 
becomes 


n G ( 

£($|y,F> = n^5Z 

i= 1 g =1 l Si 


Ti 


fy\s,({yit | s it,Qg — l,1p,T) 


,t =1 


f s (Si | S,Q,t) >7T, 


9 ) 


( 6 ) 


where $ = (tp, 6, Q, 7r). The computational complexity of the proposed approach is linear with 
the integral dimension, while it increases exponentially in the Gaussian quadrature approach, 
see Maruotti and Ryden (2009). Since locations in the finite mixture are completely free to 
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vary over the corresponding support, extreme departures from the homogeneous model can be 
accommodated. Direct maximization of the likelihood in equation ([b]), although possible, can 
be challenging. A generalization of the EM algorithm for finite mixtures provided by Aitkin 
(1999) offers a simple alternative. In Section [5] we briefly sketch the structure of the algorithm. 
Further detail can be found in the supplementary material. 


4 Handling non-ignorable drop-out 

In this Section we extend the IqmHMM formulation to handle potentially non-ignorable drop¬ 
out. Drop-out is a common problem in longitudinal data since individuals may leave the 
study before its planned end. While unbalanced designs do not pose particular problems, 
the question is whether missing data may bias parameter estimates. Let R.; = (Rn ,..., Rit) 
denote the missing data indicator vector for the z-th individual, where Ru = 1 if yu has not 
been observed at time t = 1, ...,T and zero otherwise. Since we are considering drop-out, that 
is irretrievable exit from the study, Ra = 1 => Rw = 1, f > t = 1,..., T. 

Let and 7 denote the parameter set for the longitudinal and the missing data model, 


respectively. Little (1993) and Little and Rubin (2002) define two broad classes of models 


to handle (potentially) non-ignorable missing data. In the selection model (SM) formulation, 


see e.g. Heckman (1976), the joint distribution of and r* can be factorized as 


fyA r i,Yi I $ -7) = fr\yAi \ YiiAfyiVi I ®), *= 1, 


(7) 


where the conditional density [rj | y,, 7 ] describes the selection mechanism leading each unit to 
continue or stop participating in the study. In the pattern-mixture model (PMM) formulation, 
the following factorization holds 


fyA r uYi I $, 7 ) = fy\r(Yi I D, *)fr(Ti \ j), %= 1, ••• ,n . 


( 8 ) 


The rationale for PMMs is that each subject has its own propensity to drop-out from the 
study; individuals dropping-out from the study closer in time have similar propensities and 
share some common observed and/or unobserved features. Therefore, the model for the whole 
population is given by a mixture over these patterns. Further modelling alternatives (e.g. 


shared parameter models - SPMs) are reviewed by Little (1995) and Rizopoulos and Lesaffre 


(2014). In the hidden Markov literature, Bartolucci and Farcomeni (2015) discuss a model for 


multivariate longitudinal responses and a (discrete) time to event, with discrete time-varying 
and time-constant random intercepts shared by the longitudinal response and the missingness 


indicator. A pattern mixture approach has been described by Maruotti (2015), where the 


transition matrix is a function of the number of available measurements for the individual. 
According to this latter proposal, we define a PMM to account for the potential presence of 
informative missingness; we assume to be interested in the conditional quantiles of the response 
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variable and adopt a lqHMM formulation. To overcome the weak identifiability of PMMs due 


to a possibly large number of patterns (see e.g. Roy, 2003), we consider a reduced number of 
classes, representing different propensities to drop-out from the study; in the following, we will 
refer to them as latent drop-out (LDO) classes. 

Let Ti = T — Rn denote the number of measurements available for the z-th individual, 
i = 1,..., n. The latent variable Ci = (Cii> • • ■, Cic) which is used in the IqmHMM to denote 
component membership, is now used to denote the membership to a specific LDO class. Sample 
units in the same LDO class share some common latent characteristics that lead to changes in 
the covariate effect estimates. According to Roy (2003) and Roy and Daniels (2008), we assume 
that individuals with a higher propensity to remain into the study have a higher chance to 
present complete responses, i.e. they have higher values for Ti. Hence, the probability of being 
in one of the first LDO classes is described by a monotone function of the number of available 
measurements T. For a given r E (0,1), the following (proportional) ordinal regression model 
is used 

exp (X 0g + Ai Ti) 


Pr 


Ec« = i|r,,T U- 
1=1 ' 


+ exp (X 0g + Ai Ti) 


(9) 


under the constraints Aoi < ••• < Aog-i- Two issues should be noticed. First, by using a 
LDO-based approach, we often consider a reduced number of classes, and this help solve the 
weak identifiability issues common to PMMs. Second, this model specification generalizes the 
IqmHMM , since the latent variable Ci is now ordinal and the corresponding masses are defined 
to be a function of T). We assume that, conditional on Su = su and (i g = 1, longitudinal ob¬ 
servations from the same individual are independent (local independence assumption); further, 
Ci capture all the dependence between the longitudinal and missing data process; conditional 
on such latent variables, the two processes are no longer dependent. As before, longitudinal 
responses follow a conditional ALD, with location parameter defined by 


Qr{yu | su, Cig = 1,t)= x.' it f 3 + z it b g + w ' it a Sit , 


( 10 ) 


where the dependence of model parameters on r has been suppressed for ease of notation. 
Denoting by y° and y™ the observed and the missing part of the individual sequence y i5 the 
observed data likelihood for the z-th individual is given by 


£(.)(*-7 | y°,Ti,r) = VV / fyis^yilsiXig = f s (si\ S,Q,t): 

Si g =i J y? 

x n ig (Ti | A ,r)f T (Ti \ 7 ,r)dy"\ 


( 11 ) 


where zr i g (Ti | A, r) = f^xiCig = 1 | R, A, r) is the conditional probability for the z-th individual 
in the g-th LDO class, g = 1 and is defined as the difference between two adjacent 

cumulative logits, see Agresti (2010). By assuming that the parameters $ and 7 are separate, 
the marginal distribution of the missing data process friT \ 7 , r) can be left unspecified and 
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inference can be based on the conditional observed data likelihood 


_ G 

L(i)(® | y i,Ti,r) = '£2^2f y \s<;(y 0 i I s i,(ig = l,^,r)/ s (s < | <5, Q,T)TT ig (Ti 

Si g=l 


A, t). 


( 12 ) 


We may notice that expression (12) reduces to ([ 6 ]) when 'Xi g (T l | A, r) = n g , Mi = 1 
and g = 1 that is when Ai = 0 in equation ([9]). Hence, mixture components are 

introduced either as a way to approximate intractable integrals (as in IqmHMM ) or to describe 
time-constant unobserved characteristics that are related to the drop-out mechanism (as in 
IqHMM+LDO). 


5 ML estimation 


Parameter estimates for the IqmHMM and the IqHMM+LDO are obtained by using the Baum- 


Welch algorithm (MacDonald and Zucchini, 1997). In this section, we briefly sketch the algo¬ 


rithm; a more detailed discussion is available in the Supplementary material. As before, we 
suppress the r indexing of the model parameters. We will refer to LDO classes with the generic 
term “components”, to align the terminology of IqmHMM and IqHMM+LDO , and use 7 Tj g in 
place of iTi g (Ti | A, r) for the IqHMM+LDO formulation, while 7 r , 3 = ir g ,Mi = 1 for the 

IqmHMM one. Let iq(/i) = I [Su = h] denote the indicator variable for the i-th individual be¬ 
ing in the h- th state at time occasion t and uu(k, h) = I [Sn-i = k, Sa = h\ indicating whether 
he/she moves from the k- th state at time occasion t — 1 to the h- th one at time occasion t. As 
before, Cig is the indicator variable for the i-th unit belonging to the 5 -th component. Starting 
from the definition of the conditional complete data log-likelihood 

n s m Ti m G 

4($ | y,T,S,C,r) oc ^ Un(h) log 5 h + ^ ^ u it (k, h) log q kh + ^ Ci<? 1 °g 7 r *g+ 

i =1 ^ h =1 t =2 h,k =1 g=l 


Ti m G 

Ti log(o-) - EEE Til ( k ) Qigfij 
t= 1 h= 1 9=1 


Hit IMt [4/ — h, b^] 


<7 


(13) 


parameter estimates are derived by using an EM-type algorithm, that is by alternating an E- 


and a M-steps. In the E-step we take the expected value of the complete data loglikelihood (13), 


given the observed data and the current parameter estimates; we refer to this as Q($ | $ ( 'i). 
This amounts to replacing the indicator variables by the corresponding (posterior) expected 
values. Once these quantities have been computed, in the M-step, model parameter estimates 
are derived by maximizing Q($ | <l>^). Given the separability of the parameter spaces for the 
longitudinal and the missing data process, the maximization can be partitioned into indepen¬ 
dent sub-problems, which considerably simplifies the computations. See the Supplementary 
Material for details. 

The E- and the M-step of the algorithm are iterated until convergence, that is until the (rela- 










tive) difference between subsequent likelihood values is lower than an arbitrary small amount 
e. Penalized likelihood criteria, such as BIC (Schwarz, 1978), can be used to jointly iden¬ 


tify the best number of components and hidden states. As it typically happens in the linear 
quantile mixed model framework, standard errors for parameter estimates are obtained by non- 
parametric block bootstrap. This amount to resampling individual indexes and keep all the 
corresponding measurements in order to preserve the within individual dependence structure, 
see e.g. Lahiri (1999) for references. 


6 Application: Re-analysing the CD4 cell count data 

6.1 Data Description 


The models we propose are illustrated by re-analysing the CD4 cell count dataset (Zeger and 


Diggle. 1994). Data come from the Multicenter AIDS Cohort Study (MACS), started in 1984 
and involving more than 5000 volunteered gay and bisexual men from Baltimore, Pittsburgh, 
Chicago and Los Angeles. The HIV virus destroys the T-lymphocytes (CD4 cells) which play a 
vital role in immune function; the virus progression can therefore be assessed by measuring the 
number of CD4 cells, which, on average, tend to decrease throughout the incubation period. 
Among the volunteers participating in the study, 371 (7%) seroconverted during the analysed 
time window. Since the aim was to understand the impact of serconversion of serconversion 
on the dynamics of the CD4 count, we have considered in our analysis, coherently with |Zegerl 


and Diggle (1994), 2376 measurements from 369 individuals (two of them have been discarded 
due to missing values in the covariates), observed from a minimum of 3 years before to a 
maximum of 6 years after the seroconversion. The observed data suffer from attrition and, 
for each individual, the number of available measurements ranges from a minimum of 1 to a 
maximum of 12. 

The interest is at determining the effect of a set of covariates on the evolution of the CD4 cell 
counts over time while controlling for sources of unobserved heterogeneity. The set of covariates 
includes: years since seroconversion (negative values indicate the current CD4 measurement 
was taken before the seroconversion), age at seroconversion (centered around 30), smoking 
(packs per day), recreational drug use (yes or no), number of sexual partners, depression 


symptoms measured by the CES-D scale (Radloff. 1977), ranging from 0 to 60, with larger 


values indicating more severe symptoms. The analysis was conducted on the log transformed 
CD4 counts, that is log(l + CD4 count). We start the analysis by assuming a non-informative 
drop-out scenario, and compare the obtained results with those from the corresponding non- 
ignorable missing data model, the IqHMM+LDO. 


6.2 Linear quantile mixed hidden Markov model 

To model the evolution of individual trajectories over time, we have considered individual- 
specific, time-varying and time-constant, random parameters. From a preliminary exploratory 
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analysis, and by looking at the fit of a number of regression models, we have decided to focus 
on the following model specification: 

Qr(yu | s it ,hi) = AtPij) + z it h i( T ) + w i t a Sit ( T ), 

where, x# includes two continuous covariates (time since seroconversion and age), the dummy 
variable drug (baseline: no) and three discrete variables (packs of cigarette per day, number 
of sexual partners and CES-D score). The vectors associated with the time-constant and the 
time-varying random parameters, z# and w a = w,;, include the time since seroconversion and 
a column of ones, respectively. That is, the model considers a set of fixed parameters, a time- 
varying random intercept and a time-constant random slope. 

We have fit the proposed model for a varying number of hidden states and mixture compo¬ 
nents and for three quantiles, r = (0.25,0.50,0.75). To reduce the chance of being trapped 
in local maxima, we have adopted a multi-start strategy, which can be described as follows. 
Deterministic starting solutions for the initial and the transition probabilities have been ob¬ 
tained by setting 5h = 1 /m and qkh = (1 + sl(h = k))/(m + s), for a suitable constant s > 0. 
Starting values for fixed parameters have been obtained by fitting a standard linear quantile 
regression model, while for the time-varying and the time-constant random parameters, m and 
G Gaussian quadrature locations are added to the corresponding fixed effects. To obtain a set 
of random starting values, we have randomly perturbed these deterministic starts. For each 
combination of the number of hidden states and the number of mixture components, we have 
considered 30 random start, and retained the best solution according to the BIG, see Table [4| 

Table H] about here 

The results suggest to select a model with m = 4 hidden states at all the analysed quantiles, 
thus highlighting quite a strong time-varying unobserved heterogeneity. The distribution for 
the individual-specific slope associated to Time sero is approximated by a discrete distribution 
with a number of mixture components that decreases as we move towards the right tail of 
the response distribution. As it can be evinced by looking at Table [4j we choose G = 5,4, 3 
components for r = 0.25, r = 0.50 and r = 0.75, respectively. In Table [5j we report the 
parameter estimates for the longitudinal data model, with 95% confidence intervals based on 
B = 1000 block bootstrap samples. 


Table [5] about here 


As it can be noticed by looking at Table [5} for all the analysed quantiles, age appears to play 
a minor role while the packs of cigarettes and the number of sexual partners have a positive 
and significant effect on the log count of CD4 cells. At the tails of the response distribution, 
the recreational use of drugs has a positive effect while this is negligible at the median as it 


is clear by looking at the corresponding confidence intervals. According to Zeger and Diggle 
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(1994) the positive effect associated to some “risk” factors reflect a selection bias mechanism, 
where healthier men stay longer in the study and choose to continue their usual practices. 
More severe depression symptoms, indicated by higher values of the CES-D score, imply a 
slight decrease in the number of T-lymphocytes. For the time since seroconversion, the effect 
is negative; that is, the number of CD4 cells decreases with increasing lag from the date of 
seroconversion (see the Time ser o effect in Table [5]). This effect reduces when we move towards 
higher quantiles, suggesting that the progression in time of the virus is slower for healthier 
men. The estimated variance of the random parameters (pb) is significantly different from zero 
and reveals the presence of substantial individual-specific departures from the homogeneous 
effect of Time ser o on the T-lymphocyte counts; as for the number of components, also this 
variability reduces when moving towards the right tail of the response distribution. We may 
also notice that, when we move from r = 0.25 to r = 0.75, state-dependent intercept estimates 
tend to increase, and this is coherent with increasing values of baseline (log) CD4 levels. Table 
[6] reports the estimated initial and transition probabilities for the hidden Markov chain. The 
combination of these results with the intercept values reported in Table [5] helps us understand 
the evolution of CD4 cell counts over time, conditional on the observed covariates. 


Table [6] about here 


For all the analysed quantiles, the estimated initial probabilities suggest that most of the 
sample units start the study with intermediate levels of CD4 cell counts (62 + 83 > 0.70), 
while few observations start with more extreme (lower or higher) levels. For r = 0.50 and 
r = 0.75, transitions between hidden states over time are quite unlikely (qhh > 0.8) and, if any 
transition is observed, units tend to move towards states with lower values of the intercept. 
For t = 0.25, we observe a slightly different evolution of the response over time. Estimated 
transition probabilities highlight that, for less healthy men, the number of CD4 cells in the 
blood tends to repeatedly increase and decrease over the follow-up time, and this is particularly 
evident for “lower” hidden states. Transitions towards the first hidden state (with the lower 
CD4 log-count) are unlikely QC/cLi Qki < 0.15) and, if any transition to the first hidden state is 
observed, in the next occasion individuals tend to move towards states characterized by higher 
CD4 cell count levels (gn = 0.284), that is, the sudden decrease to the fist hidden state is just 
temporary. 


6.3 Linear quantile hidden Markov model with latent drop-out classes 

As we have already discussed, each individual has been observed from a minimum of 3 years 
before to 12 years after seroconversion. Table [7] shows the number of individuals remaining 
in the study at each measurement occasion. Only a small portion of individuals has been 
observed until the end of the follow-up time. 

Table m about here 
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We report in Figure [l] the distribution of the response variable at each time occasion, stratified 
by whether or not units drop-out from the study between the current and the subsequent 
measurement; as it is clear, CD4 levels tend to suddenly decrease just before the units drop¬ 
out of the study, especially in the first measurements occasions. 

Figure [l] about here 

These findings suggest that healthier individuals tend to stay longer into the study and that a 
potential dependence between the longitudinal and the missing data process may be present. 
As we have already highlighted, individual-specific heterogeneity in the slope for Time aer o 
decreases when moving from the first to the third quartile (that is when moving from sicker 
to healthier men); here, we aim at analysing if such changes may somehow be related to the 
missing data process. For this purpose, wee have defined the following IqHMM+LDO 

QAyit I Sit , Cig = 1) = x'f/3(r) + z' t b g(r) + w' t a«.. t(r) , 

to compare the results with those we have obtained by the MAR counterpart, the IqmHMM 
we have discussed before. The vectors x#, and w n are defined as in the IqmHMM specifi¬ 
cation and parameters have all the same interpretation but the random slope for Time ser o> 
which is now assumed to vary across LDO classes. We have fit the proposed model for 
r = (0.25,0.50,0.75). To avoid local maxima, model parameters have been initialized via 
the same multi-start strategy we have described for the IqmHMM. Initial values of the missing 
data model parameters have been obtained by fitting an ordered logit model on the discretized 
times to drop-out, randomly perturbed to avoid infinite estimates for the 7) effect. For each 
combination of the number of hidden states and LDO classes, we have considered 30 starting 
points and retained the solution with the best BIC value. Results are reported in Table [8} 

Table [H about here 

We select the model with m = 5 hidden states and G = 5 LDO classes when modelling the 
first quartile of the response distribution (r = 0.25); for the median and the third quartile, the 
solution with m = 4 and G = 4 provides the lowest BIC values. By looking at the parameter 
estimates for the LDO class model at r = 0.75, we noticed that two Ao s estimates do not 
significantly differ from zero and the corresponding confidence intervals overlap. Therefore, to 
avoid spurious solutions, we have decided to search for the optimal number of classes within 
the set of models with G < 3. As a result, for t = 0.75, the best fit corresponds to m = 4 
hidden states and G = 3 LDO classes. In Table [9j we report the estimated parameters (state- 
dependent intercepts and fixed slopes) for the longitudinal data model with corresponding 95% 
confidence intervals, in parentheses, based on B = 1000 block bootstrap samples. 

Table [9] about here 
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If we compare results in Table [9] with those obtained by the IqmHMM and reported in Table 
[5j we may observe only slight changes in the fixed parameter estimates for Packs and Time ser o 
at r = 0.25, while all other fixed parameters seem unchanged. As expected, estimates of the 
state-dependent intercept increase when moving from the left to the right tail of the response 
distribution. By combining these results with the estimated initial and transition probabilities 
reported in Table [T0[ we notice that we have obtained findings that are similar to those we have 
discussed for the IqmHMM specification. Only for r = 0.25 we observe a further state with a 
lower intercept estimate that seem to be linked to highly variable dynamics of the response for 
units dropping-out early from the study; for r = {0.50,0.75}, differences seem to be negligible. 

Table ESI about here 


In Table 11, we show the estimated LDO-dependent locations side by side with the location 
estimates for the IqmHMM specification. 


TableEH about here 


In both models, the effect of Time ser o on the CD4 cell count is negative, significant and the 
estimate reduces when moving from r = 0.25 to r = 0.75. Location estimates suggest that 
individuals belonging to “lower” mixture/LDO components have a steep reduction in the (log) 
number of T-lymphocytes when the time since seroconversion increases. This effect progres¬ 
sively reduces for units belonging to “higher” latent categories. The results obtained from the 
IqHMM+LDO can be further explored by looking at the LDO class model estimates reported 
For all the analysed quantiles, the negative effect of the time to drop-out (Ai < 0) 


in Table 
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suggests that “lower” LDO classes identify groups of individuals with shorter longitudinal se¬ 
quences: the probability of belonging to one of the first g classes reduces when the number of 
available measures increases. 


Table d2] about here 

These results are clearer when looking at Figure [2j where we report the longitudinal trajectories 
of individuals classified into the different LDO components under the IqHMM+LDO formula¬ 
tion via a MAP rule for r = (0.25,0.50,0.75). Local polynomial regression curves (blue lines), 
95% confidence intervals (gray bands) and mean values (blue dots) are reported to highlight 
the general trend. Due to the missing data process, wider confidence intervals are observed 
for the last measurement occasions. As it is clear, higher LDO classes correspond to longer 
longitudinal sequences. 

We use different colors to highlight those individuals that, under the IqmHMM formulation, 
have not been classified in the same component as in the IqHMM+LDO one. Red and yel¬ 
low trajectories identify individuals that have been moved forward (higher classes, red) and 
backward (lower classes, yellow), respectively, when estimating the IqHMM+LDO. Black tra¬ 
jectories represent the evolution over time of the CD4 count levels for those individuals that 
have been classified likewise by the two model specifications. 
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Figure [2] about here 


By looking at these Hgures, it is clear that, generally, the two models offers quite a similar clas¬ 
sification when we look at the component membership for r = 0.50 and r = 0.75 are concerned. 
As we have highlighted before, the effect of the missing data process on the longitudinal re¬ 
sponse is quite negligible when considering individuals in better health conditions, even if some 
anomalies still seem to be corrected when modelling the missing data process: some individuals 
with longer (respectively shorter) sequences and weaker (respectively stronger) reduction of 
the response values over time are moved in higher (respectively lower) LDO classes under the 
lqHMM+LDO formulation. 

As regards r = 0.25 (i.e. for responses associated with less healthy individuals, which often rep¬ 
resent the main target of inference), classifications supplied by the two modelling approaches 
seem to be quite different, thus confirming the stronger impact of the missing data process 
on the first quartile of the response distribution. Individuals are mainly shifted in lower LDO 
classes when compared to IqmHMM results. These classes are characterized both by shorter 
longitudinal sequences and by a stronger impact (especially in the last observed occasions) of 
Time sero on the CD4 count levels (also stronger than those identified by IqmHMM) which seem 
to be coherent with the individual path shown in the first panel of Figure [2} 

These results, together with the lower BIC values obtained under lqHMM+LDO (see Tables 
[I]j8]), suggest a better fit and, thus, render lqHMM+LDO an interesting modelling solution for 
the analysis of such kind of data, especially when we look at the first quartile of the response 
distribution. The observed increase in the log-likelihood values we obtain when moving from 
IqmHMM to lqHMM+LDO could be ascribed to a more flexible structure for the component 
priors; in the IqmHMM these are constant across individuals while, in the lqHMM+LDO , they 
depend on individual-specific features. In our formulation, we assume that such features are 
connected to a differential propensity to stay in the study, which, in turn, is summarized by 
T,-. Nevertheless, this propensity is unobservable and, therefore, we may not conclude that the 
missing data process is truly informative, since 7) could represent other, unobserved, individual 
characteristics that are not linked to the propensity to drop-out. A sensitivity analysis to check 
for non-ignorability of the missing data generating process represents a further step to validate 
the model. 


7 Simulation study 


To study the performance of the proposed models, we have implemented a large scale simula¬ 
tion study made up by two different scenarios. First, to evaluate the empirical behaviour of 
IqmHMM , we have considered a scenario (Scenario 1) with completely observed longitudinal 
responses and compared our proposal with the lqHMM specification we obtain when setting 
G = 1. This would reduce extra-variability due to incoherence between the code we have de¬ 


veloped for IqmHMM and the code developed by Farcomeni (2012) for lqHMM. The simulation 
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scheme and the results we have obtained under Scenario 1 are detailed in the Supplementary 
Material. A second scenario (Scenario 2) has been considered to assess how IqHMM+LDO be¬ 
haves when a non-ignorable missing data process affect the longitudinal responses with respect 
to the corresponding MAR counterpart, that is the IqmHMM. Two different intensities for the 
relationship between the drop-out and the time spent into the study have been considered in 
order to capture differences between the two model specifications. 

7.1 Simulation Scenario 2: partially complete data 

Data have been generated from a mixed hidden Markov model with two hidden states (m = 2) 
and three latent drop-out classes (G = 3). We have considered longitudinal measures on a 
sample of n = 100, 200 individuals at T = 5,10 equally spaced measurement occasions; some 
individuals drop-out from the study before the planned end, thus presenting incomplete data 
records. To simulate the time to drop-out, Tj, we have considered a discrete distribution with 
Pr(Tj = j) = 1/(T— 1 ),j = 1,...,T, meaning that approximately only 25% (T = 5) and 11% 
(T = 10) of the enrolled individuals do present complete data records. Initial and transition 
probabilities for the hidden Markov chain have been fixed to 

/ 0.7 0.3\ 

S = (0.7,0.3) and Q = , (14) 

^0.3 0.7 J 

and two different sets of A parameters have been considered for the missing data process. The 
former set, Aoi = 5, A 02 = 8.5, Ai = —1.1, has been chosen so that class probabilities are 
strongly related to the drop-out time (“high informative drop-out scenario”). The latter set, 
Aoi = 1, A 02 = 2.75, Ai = —0.3, implies that the drop-out time does not strongly influence the 
LDO class membership (“low informative drop-out scenario”). For the longitudinal observa¬ 
tions, the following regression model holds for the h- th state of the Markov chain and the 5 -th 
LDO class: 

Y it = a h + b g Xiti + fi x it 2 + e it , (15) 

where (3 = —0.8, xm ~ N(l,3) and x ^2 ~ Unif[0,10]. State-dependent intercepts have been 
set to ai = 100 and «2 = 102.5, while LDO-dependent parameters have been set to b\ = 
0 . 5,62 = 1-5, 63 = 3. The difference between state-specific intercepts, that is ( 011 , 012 ), has been 
set to a lower value than the one considered in Scenario 1, to verify whether aliasing between 
the as and the b g s may occur in such a scenario. 

Also, we have considered different probability distributions to generate the measurement error, 
that is a standard Gaussian distribution, a Student 1 3 distribution, and a distribution, where 
the latter two scenarios allow for heavy tailed and skewed data, respectively. 

We have generated B = 250 samples and model parameters have been estimated for three 
quantiles t = (0.25,0.50,0.75). To evaluate the model performance, the bias and root mean 
square error (RMSE), over simulations, have been computed for each model parameter. 
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As no significant differences have been found between results for n = 100 and n = 200, Tables 
[l](3]show only results obtained under the former scenario for r = 0.25, r = 0.50 and r = 0.75, 
respectively; results for n = 200 are available from the authors upon request. 

Before going into details, it is worth to contextualize the simulation study and highlight what 
we expect. As it can be easily observed, the models we are comparing share the same linear 
predictor and the same overall structure but for the mixture component probabilities. In the 
IqHMM+LDO formulation, these are directly related to the time each individual spent into the 
study, while, in the IqmHMM one, they are constant over individuals. Therefore, we expect 
that differences between the two approaches will be negligible in the “low informative drop¬ 
out scenario”, that is when setting Ai = —0.3, and more evident in the other one, defined 
by Ai = —1.1. Indeed, in this latter case, the components of the finite mixture are closely 
related to Tj; and, therefore, IqHMM+LDO should be able to recover more accurately individual 
memberships to LDO classes and ensure more stable estimates for the component-specific 
locations b g . Obviously, with increasing T, the effect of the different mixture components 
become clearer and, as a result, we should observe that behaviours of the IqmHMM estimates 
approach those of IqHMM+LDO. 


Tables [TTI31 about here 

As expected, in all the considered simulation scenarios, the quality of parameter estimates 
increases as the number of available measures increases both for IqmHMM and IqHMM+LDO. 
The fixed parameter j3 in the longitudinal data model is always estimated with higher accu¬ 
racy than the parameters associated to time-constant or time-varying latent variables. Higher 
RMSEs are generally observed when moving far from the center of the response distribution, 
especially for the state-dependent intercepts. According to the data generating proce¬ 
dure, these parameters are directly related to the r-th quantile value of the error 
distribution and, therefore, in low density regions, such as in the tails, the qual¬ 
ity of results generally decreases. With increasing sample sizes and number of 
measurement occasions the quality of results seems to improve, but for the a 2 
parameter in the y| case for T = 0.75. Here, both IqHMM+LDO and IqmHMM 
algorithms seem to face some difficulties in recovering the true effect, possibly 
due to some aliasing between time-constant and time-varying random parameters. 
However, based on the parameters we have considered for the simulation, the first 
state of the hidden Markov chain is the most likely one; as a result a\ is generally 
estimated with some more accuracy than 02 - Lower RMSE are obtained in the 
case of Gaussian errors when compared to the heavy tailed and the skewed case 
because of a reduced amount of information, except for the first quartile (r = 0.25), 
where y 2 distributed random errors ensure more information that lead to a higher 
quality estimates. 

When comparing results obtained by fitting IqHMM+LDO with those coming from the IqmHMM 


16 


formulation, better results are generally obtained, both in terms of bias (more evidently) and 
variability. As expected, differences are more noticeable in the “high informative drop-out sce¬ 
nario” and for those parameters related to the LDO classes. Based on the chosen values for b g s, 
individuals dropping-out earlier in time (i.e. belonging to the first LDO classes) are those with 
lower values of the longitudinal outcome. Therefore, the missing data process mostly influences 
the left tail of the response variable distribution (as in the CD4 data example). Therefore, in 
the left tail, the distinction between LDO classes is more evident and overlapping between com¬ 
ponents is less likely. This can somehow explain the reduced differences between IqHMM+LDO 
and IqmHMM , when moving from the left to the right tail of the response distribution. Figures 
[3]|4]show the distribution of the adjusted rand indexes over the simulations comparing the true 
LDO class membership and the estimated allocations obtained through the IqHMM+LDO and 
the IqmHMM. 

Figures [3]|4] about here 

As it is clear, when the missing data process is directly taken into consideration, the estimated 
LDO class memberships are much more reliable for all the considered simulation scenarios 
and all the estimated quantiles. This result confirms what we have already noticed when 
analysing the CD4 data. Although differences between estimated parameters are negligible, 
the IqHMM+LDO formulation offers more homogeneous groups which, thanks to the estimated 
A parameters in the LDO class model, are also easier to be interpreted. 


8 Concluding remarks 


In this manuscript we have discussed a class of mixed hidden Markov quantile regression 
models for longitudinal continuous responses; a general dependence structure is considered by 
allow the measurement from each statistical units share both time-varying and time constant 


random parameters, thus providing an extension to the models proposed by Geraci and Bottai 


(2007) and Farcomeni (2012). Both unobserved heterogeneity sources are modelled by using 


discrete distribution, in a nonparametric fashion, in order to produce a conditional model 
which should be robust under a series of empirical situations, as shown in the simulation 
study. Since unobserved heterogeneity in this context may arise due to omitted covariates 
or be influenced by patterns of drop-out, we allow the time-constant random parameters to 
have a distribution which is dependent on the observed pattern of drop-out (ie the number of 
measurements for each individual) through a drop-out related ordered latent class, as suggested 


by Roy (2003) and Roy and Daniels (2008). The simulation study and the re-analysis of a well 


known benchmark dataset, the CD4 cells data of Kaslow et al. (1987), give quite encouraging 


results, showing how the proposed models can be easily applied to complex data structures. 
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Tables 


Table 1: Simulation study for r = 0.25. Bias and R.MSE for longitudinal parameter estimates 
under the IqmHMM and IqHMM+LDO formulation. 
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IqmHMM 
T = 5 

-0.001 (0.25) 
-0.606 (0.83) 
0.003 (0.04) 
0.000 (0.05) 
-0.001 (0.05) 
-0.004 (0.06) 
T = 10 
0.099 (0.25) 
0.463 (0.68) 
-0.001 (0.03) 
-0.010 (0.05) 
-0.015 (0.04) 
-0.019 (0.04) 

T = 5 

-0.106 (0.33) 
-0.913 (1.08) 
0.002 (0.04) 
-0.007 (0.08) 
-0.051 (0.22) 
-0.272 (0.63) 
T = 10 
-0.076 (0.23) 
-0.407 (0.62) 
0.000 (0.03) 
0.011 (0.05) 
0.015 (0.04) 
0.017 (0.04) 


N 

IqHMM+LDO 


0.026 (0.27) 
-0.628 (0.85) 
-0.008 (0.04) 
-0.002 (0.05) 
-0.004 (0.05) 
-0.005 (0.06) 


-0.071 (0.23) 
-0.427 (0.69) 
-0.001 (0.03) 
0.007 (0.05) 
0.002 (0.10) 
0.008 (0.10) 


-0.075 (0.32) 
-0.897 (1.07) 
0.001 (0.04) 
0 (0.05) 
-0.02 (0.15) 
-0.26 (0.59) 


-0.035 (0.22) 
-0.375 (0.59) 
-0.002 (0.03) 
0.008 (0.05) 
0.013 (0.04) 
0.013 (0.04) 


h 

IqmHMM IqHMM+LDO 


0.014 (0.39) 
-0.998 (1.19) 
-0.001 (0.05) 
-0.003 (0.06) 
0.006 (0.06) 
0.002 (0.08) 


0.059 (0.35) 
-0.984 (1.19) 
-0.002 (0.05) 
-0.006 (0.06) 
0.005 (0.06) 
-0.001 (0.07) 


0.188 (0.41) 
0.884 (1.13) 
-0.004 (0.04) 
-0.006 (0.06) 
-0.003 (0.10) 
-0.003 (0.09) 


-0.236 (0.59) 
-0.867 (1.12) 
0.001 (0.04) 
0.004 (0.06) 
0.004 (0.08) 
-0.002 (0.09) 


-0.137 (0.46) 
-1.164 (1.32) 
0.005 (0.05) 
- 0.020 ( 0 . 11 ) 
-0.055 (0.24) 
-0.275 (0.63) 


-0.099 (0.47) 
-1.143 (1.30) 
0.002 (0.05) 
-0.004 (0.05) 
- 0.001 ( 0 . 10 ) 
-0.238 (0.57) 


-0.124 (0.34) 
-0.824 (1.04) 
0.002 (0.04) 
-0.004 (0.07) 
0.009 (0.05) 
0.007 (0.04) 


-0.071 (0.31) 
-0.756 (0.99) 
0.001 (0.04) 
-0.006 (0.07) 
0.007 (0.05) 
0.004 (0.04) 


IqmHMM 


xl 


IqHMM+LDO 


-0.042 (0.20) 
-0.090 (0.24) 
0.000 (0.03) 
-0.005 (0.03) 
-0.007 (0.04) 
0.005 (0.05) 


-0.032 (0.20) 
-0.087 (0.25) 
0.001 (0.03) 
-0.006 (0.03) 
-0.008 (0.04) 
0.002 (0.05) 


-0.105 (0.43) 
-0.153 (0.38) 
0.002 ( 0 . 02 ) 
0.006 (0.06) 
0.004 (0.10) 
0.005 (0.13) 


-0.094 (0.16) 
-0.142 (0.22) 
0.001 ( 0 . 02 ) 
0.005 (0.03) 
0.004 (0.02) 
0.003 (0.02) 


-0.108 (0.21) 
-0.185 (0.37) 
0.002 (0.03) 
0.002 (0.06) 
-0.007 (0.12) 
-0.177 (0.44) 


-0.093 (0.21) 
-0.195 (0.40) 
0 (0.03) 
0.006 (0.03) 
0.006 (0.04) 
-0.166 (0.42) 


-0.095 (0.15) 
-0.145 (0.23) 
- 0.001 ( 0 . 02 ) 
0.004 (0.03) 
0.003 (0.02) 
0.004 (0.02) 


-0.078 (0.14) 
-0.121 (0.18) 
-0.001 (0.02) 
0.003 (0.03) 
0.002 (0.02) 
0.001 (0.02) 


21 






Table 2: Simulation study for r = 0.50. Bias and R.MSE for longitudinal parameter estimates 
under the IqmHMM and IqHMM+LDO formulation. 




N 


h 

xl 


IqmHMM 

IqHMM+LDO 

IqmHMM 

IqHMM+LDO IqmHMM 

IqHMM+LDO 


T = 5 

ai 0.011 (0.25) 

-0.076 (0.24) 

0.017 (0.28) 

-0.060 (0.27) -0.218 (0.50) 

-0.260 (0.46) 


a 2 -0.022 (0.74) 

-0.255 (0.40) 

-0.092 (0.48) 

-0.271 (0.46) -0.130 (0.72) 

-0.322 (0.46) 


P 0.001 (0.04) 

-0.002 (0.04) 

0.001 (0.04) 

0.000 (0.04) 0.002 (0.05) 

0.003 (0.05) 

CO 

b ! 0.009 (0.05) 

0.016 (0.05) 

0.007 (0.06) 

0.007 (0.05) -0.002 (0.08) 

-0.016 (0.05) 

1 

b 2 -0.078 (0.27) 

-0.003 (0.13) 

-0.071 (0.25) 

-0.004 (0.11) -0.058 (0.19) 

-0.028 (0.11) 

in 

63 -0.180 (0.48) 

-0.012 (0.19) 

-0.148 (0.45) 

0.000 (0.10) -0.097 (0.33) 

-0.021 (0.15) 

T—1 

T = 10 





II 

ai 0.016 (0.23) 

0.060 (0.18) 

0.023 (0.30) 

0.077 (0.21) -0.248 (0.46) 

-0.107 (0.58) 

< 

a 2 -0.116 (0.24) 

-0.085 (0.23) 

-0.097 (0.25) 

-0.058 (0.23) -0.339 (0.44) 

-0.327 (0.49) 


P 0.001 (0.03) 

0.000 (0.03) 

0.000 (0.03) 

0.000 (0.03) -0.002 (0.03) 

0.003 (0.03) 


61 0.010 (0.06) 

-0.002 (0.04) 

0.009 (0.09) 

-0.004 (0.04) 0.014 (0.14) 

-0.013 (0.05) 


b 2 0.003 (0.04) 

-0.003 (0.03) 

0.009 (0.10) 

-0.003 (0.04) 0.010 (0.16) 

-0.015 (0.05) 


63 0.003 (0.04) 

-0.002 (0.03) 

0.001 (0.03) 

-0.005 (0.03) -0.010 (0.04) 

-0.002 (0.04) 


T = 5 

ai 0.041 (0.25) 

0.066 (0.26) 

0.056 (0.26) 

0.081 (0.27) -0.071 (0.55) 

0.603 (0.63) 


a 2 -0.148 (0.42) 

-0.131 (0.35) 

-0.096 (0.40) 

-0.080 (0.40) -0.159 (0.67) 

0.468 (0.52) 


P 0.000 (0.04) 

0.000 (0.04) 

0.001 (0.04) 

0.001 (0.04) -0.004 (0.05) 

0.002 (0.03) 


61 0.004 (0.05) 

0.000 (0.05) 

0.002 (0.05) 

-0.002 (0.05) -0.012 (0.07) 

0.003 (0.05) 

1—1 

T—1 

b 2 -0.042 (0.17) 

-0.008 (0.06) 

-0.013 (0.07) 

-0.009 (0.06) -0.018 (0.09) 

0.006 (0.04) 

1 

in 

63 -0.298 (0.67) 

-0.282 (0.64) 

-0.191 (0.51) 

-0.193 (0.50) -0.096 (0.47) 

0.008 (0.03) 

00 

in 

T = 10 





ii 

ai -0.118 (0.41) 

-0.047 (0.21) 

-0.064 (0.29) 

-0.011 (0.19) -0.395 (0.52) 

-0.340 (0.50) 


a 2 -0.212 (0.46) 

-0.206 (0.33) 

-0.185 (0.32) 

-0.146 (0.26) -0.487 (0.55) 

-0.475 (0.53) 


P -0.002 (0.02) 

-0.003 (0.02) 

0.003 (0.03) 

0.002 (0.03) 0.001 (0.03) 

0.000 (0.03) 


bi 0.012 (0.05) 

0.011 (0.05) 

0.004 (0.05) 

0.003 (0.05) 0.001 (0.05) 

0.005 (0.05) 


b 2 0.011 (0.07) 

0.014 (0.04) 

0.007 (0.04) 

0.006 (0.04) 0.005 (0.04) 

0.007 (0.04) 


63 0.014 (0.09) 

0.017 (0.04) 

0.011 (0.03) 

0.008 (0.03) 0.009 (0.04) 

0.007 (0.04) 
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Table 3: Simulation study for r = 0.75. Bias and R.MSE for longitudinal parameter estimates 
under the IqmHMM and IqHMM+LDO formulation. 


ot\ 

a.2 


ro 

o 

I 

LO' 

N; 

CN 


P 

bi 

b-2 

b S 


ai 

02 

P 

bi 

b-2 

b S 


Oil 

012 


in 

oo 

in 

II 


P 

bi 

b 2 

ki 


ai 

012 


P 

bi 

b 2 

b S 


IqmHMM 
T = 5 

-0.001 (0.25) 
-0.606 (0.83) 
0.003 (0.04) 
0.000 (0.05) 
-0.001 (0.05) 
-0.004 (0.06) 
T = 10 
-0.099 (0.25) 
-0.463 (0.68) 
0.001 (0.03) 
0.010 (0.05) 
0.015 (0.04) 
0.019 (0.04) 

T = 5 

-0.106 (0.33) 
-0.913 (1.08) 
0.002 (0.04) 
-0.007 (0.08) 
-0.051 (0.22) 
-0.272 (0.63) 
T = 10 
-0.076 (0.23) 
-0.407 (0.62) 
0.000 (0.03) 
0.011 (0.05) 
0.015 (0.04) 
0.017 (0.04) 


N 

IqHMM+LDO 


0.026 (0.27) 
-0.628 (0.85) 
-0.008 (0.04) 
-0.002 (0.05) 
-0.004 (0.05) 
-0.005 (0.06) 


-0.100 (0.16) 
-0.149 (0.21) 
0.000 ( 0 . 02 ) 
0.005 (0.03) 
0.003 (0.03) 
0.007 (0.02) 


-0.075 (0.32) 
-0.897 (1.07) 
0.001 (0.04) 
0.000 (0.05) 
-0.020 (0.15) 
-0.260 (0.59) 


-0.035 (0.22) 
-0.375 (0.59) 
-0.002 (0.03) 
0.008 (0.05) 
0.013 (0.04) 
0.013 (0.04) 


h 

IqmHMM IqHMM+LDO 


0.014 (0.39) 
-0.998 (1.19) 
-0.001 (0.05) 
-0.003 (0.06) 
0.006 (0.06) 
0.002 (0.08) 


0.059 (0.35) 
-0.984 (1.19) 
-0.002 (0.05) 
-0.006 (0.06) 
0.005 (0.06) 
-0.001 (0.07) 


0.188 (0.41) 
0.884 (1.13) 
-0.004 (0.04) 
-0.006 (0.06) 
-0.003 (0.10) 
-0.003 (0.09) 


-0.236 (0.59) 
-0.867 (1.12) 
0.001 (0.04) 
0.004 (0.06) 
0.004 (0.08) 
-0.002 (0.09) 


-0.137 (0.46) 
-1.164 (1.32) 
0.005 (0.05) 
- 0.020 ( 0 . 11 ) 
-0.055 (0.24) 
-0.275 (0.63) 


-0.099 (0.47) 
-1.143 (1.30) 
0.002 (0.05) 
-0.004 (0.05) 
- 0.001 ( 0 . 10 ) 
-0.238 (0.57) 


-0.124 (0.34) 
-0.824 (1.04) 
0.002 (0.04) 
-0.004 (0.07) 
0.009 (0.05) 
0.007 (0.04) 


-0.071 (0.31) 
-0.756 (0.99) 
0.001 (0.04) 
-0.006 (0.07) 
0.007 (0.05) 
0.004 (0.04) 


IqmHMM 


xl 


IqHMM+LDO 


-0.042 (0.20) 
-0.090 (0.24) 
0.000 (0.03) 
-0.005 (0.03) 
-0.007 (0.04) 
0.005 (0.05) 


-0.032 (0.20) 
-0.087 (0.25) 
0.001 (0.03) 
-0.006 (0.03) 
-0.008 (0.04) 
0.002 (0.05) 


-0.105 (0.43) 
-0.153 (0.38) 
0.002 ( 0 . 02 ) 
0.006 (0.06) 
0.004 (0.10) 
0.005 (0.13) 


-0.094 (0.16) 
-0.142 (0.22) 
0.001 ( 0 . 02 ) 
0.005 (0.03) 
0.004 (0.02) 
0.003 (0.02) 


-0.108 (0.21) 
-0.185 (0.37) 
0.002 (0.03) 
0.002 (0.06) 
-0.007 (0.12) 
-0.177 (0.44) 


-0.093 (0.21) 
-0.195 (0.40) 
0.000 (0.03) 
0.006 (0.03) 
0.006 (0.04) 
-0.166 (0.42) 


-0.095 (0.15) 
-0.145 (0.23) 
- 0.001 ( 0 . 02 ) 
0.004 (0.03) 
0.003 (0.02) 
0.004 (0.02) 


-0.078 (0.14) 
-0.121 (0.18) 
-0.001 (0.02) 
0.003 (0.03) 
0.002 (0.02) 
0.001 (0.02) 
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Table 4: CD4 data. BIC values for lqrnHMM for different choices of m and G at different 
quantiles. 


Hidden States 

1 

2 

Mixture Components 

3 4 

5 

6 

r = 

0.25 







i 


3940.50 

3530.48 

3408.42 

3312.98 

3286.67 

3298.81 

2 


3292.25 

2923.33 

2780.49 

2772.64 

2760.39 

2772.33 

3 


2963.50 

2747.21 

2676.26 

2637.39 

2619.05 

2627.48 

4 


2757.55 

2660.63 

2541.63 

2510.71 

2478.88 

2487.22 

5 


2688.98 

2527.39 

2504.99 

2494.63 

2486.83 

2507.90 

r = 

0.50 







i 


3434.26 

3014.58 

2898.11 

2850.34 

2833.41 

2874.04 

2 


2733.04 

2522.56 

2412.12 

2381.26 

2374.18 

2381.82 

3 


2523.15 

2345.76 

2280.27 

2266.98 

2256.28 

2268.15 

4 


2410.07 

2268.98 

2236.89 

2233.57 

2236.29 

2236.84 

5 


2377.80 

2298.48 

2272.29 

2266.93 

2267.97 

2278.48 

r = 

0.75 







i 


3491.69 

3134.31 

2986.87 

2951.28 

2951.44 

2940.33 

2 


2823.44 

2549.81 

2495.15 

2455.24 

2441.21 

2444.51 

3 


2470.11 

2337.64 

2290.17 

2242.39 

2244.89 

2256.24 

4 


2370.11 

2307.19 

2228.76 

2231.06 

2240.10 

2230.01 

5 


2356.16 

2295.80 

2251.37 

2252.42 

2252.06 

2274.55 


Table 5: CD4 data. Estimated longitudinal model parameters for lqrnHMM at different quan¬ 
tiles. 95% bootstrap confidence intervals are reported within brackets. 



[m 

r = 0.25 
= 4,G = 5] 

[m 

r = 0.50 
= 4,G = 4] 

[m 

r = 0.75 

= 4, G = 3] 

a i 

5.593 

(5.403; 5.677) 

6.054 

(5.994; 6.133) 

6.203 

(6.071; 6.273) 

a 2 

6.124 

(6.066; 6.166) 

6.432 

(6.368; 6.530) 

6.580 

(6.517; 6.628) 

o 3 

6.540 

(6.489; 6.587) 

6.750 

(6.689; 6.837) 

6.876 

(6.804; 6.934) 


6.915 

(6.847; 6.995) 

7.055 

(7.023; 7.231) 

7.256 

(7.168; 7.373) 

Age 

0.000 

(-0.004; 0.002) 

0.004 

(-0.001; 0.008) 

0.000 

(-0.005; 0.005) 

Drugs 

0.044 

(0.000; 0.092) 

0.057 

(-0.014; 0.110) 

0.061 

(0.003; 0.113) 

Packs 

0.056 

(0.041; 0.071) 

0.043 

(0.015; 0.054) 

0.044 

(0.015; 0.062) 

Partners 

0.006 

(0.001; 0.012) 

0.005 

(0.001; 0.012) 

0.011 

(0.003; 0.016) 

CES-D 

—0.004 

(-0.005;-0.001) 

-0.004 

(-0.006;-0.002) 

-0.004 

(-0.006;-0.002) 

Time sero 

-0.175 

(-0.206;-0.150) 

-0.140 

(-0.164;-0.114) 

-0.123 

(-0.145;-0.102) 


0.219 

(0.200; 0.360) 

0.134 

(0.105; 0.165) 

0.102 

(0.088; 0.133) 
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Table 6: CD4 data. Estimated initial and transition probabilities for lqrnHMM, at different quantiles. 95% bootstrap confidence intervals 
are reported within brackets. 
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0.019 (0.000; 0.048) 0.000 (0.000; 0.044) 0.088 (0.007; 0.192) 0.894 (0.782; 0.965) 









Table 7: CD4 data. Number of individuals in the study at each time occasion. 


Visit 1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

369 

364 

340 

315 

268 

225 

173 

133 

92 

54 

33 

10 


Table 8: CD4 data. BIC values for lqHMM+LDO for different choices of m and G at different 
quantiles. 




LDO classes 



Hidden States 

1 

2 3 

4 

5 


r = 0.25 


1 


3940.50 

3525.57 

3400.96 

3305.67 

3279.57 

2 


3292.25 

2919.41 

2773.66 

2761.84 

2755.28 

3 


2963.50 

2741.63 

2669.37 

2636.84 

2612.29 

4 


2757.55 

2660.38 

2537.89 

2509.15 

2522.63 

5 


2688.98 

2551.40 

2522.53 

2474.39 

2460.52 

r = 

0.50 






i 


3434.26 

3010.15 

2895.63 

2847.35 

2829.65 

2 


2733.04 

2517.26 

2406.67 

2377.40 

2369.86 

3 


2523.15 

2343.65 

2280.14 

2266.03 

2259.80 

4 


2410.07 

2265.06 

2233.48 

2231.14 

2233.14 

5 


2377.80 

2291.68 

2265.61 

2259.65 

2244.27 

r = 

0.75 






i 


3491.69 

3137.21 

2987.71 

2953.09 

2953.71 

2 


2823.44 

2551.39 

2491.32 

2453.73 

2448.10 

3 


2470.11 

2335.57 

2287.38 

2240.26 

2242.14 

4 


2370.11 

2308.85 

2225.93 

2203.98 

2223.70 

5 


2356.17 

2290.69 

2248.35 

2240.17 

2242.98 
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Table 9: CD4 data. Estimated longitudinal model parameters for lqHMM+LDO at different 
quantiles. 95% bootstrap confidence intervals are reported within brackets. 



[m 

r = 0.25 
= 5, G = 5] 

[m 

r = 0.50 
= 4,G = 4] 

[m 

r = 0.75 

= 4, G = 3] 

a i 

5.046 

(3.937; 5.286) 

6.043 

(5.931; 6.114) 

6.198 

(6.069; 6.282) 

0,2 

5.880 

(5.730; 5.918) 

6.416 

(6.323; 6.502) 

6.579 

(6.512; 6.628) 

03 

6.193 

(6.126; 6.256) 

6.719 

(6.647; 6.825) 

6.872 

(6.801; 6.934) 

0^4 

6.582 

(6.508; 6.634) 

7.040 

(6.973; 7.215) 

7.243 

(7.167; 7.370) 

05 

6.936 

(6.846; 7.026) 





Age 

-0.004 

(-0.007; 0.000) 

0.004 

(-0.001; 0.007) 

0.000 

(-0.004; 0.005) 

Drugs 

0.048 

(-0.013; 0.124) 

0.072 

(-0.006; 0.145) 

0.064 

(0.007; 0.115) 

Packs 

0.032 

(0.024; 0.051) 

0.042 

(0.014; 0.054) 

0.044 

(0.018; 0.064) 

Partners 

0.011 

(0.005; 0.016) 

0.005 

(0.000; 0.012) 

0.011 

(0.002; 0.016) 

CES-D 

-0.003 

(-0.006;-0.001) 

-0.004 

(-0.006;-0.002) 

-0.004 

(-0.006;-0.002) 

Time sero 

-0.157 

(-0.187;-0.127) 

-0.146 

(-0.175;-0.119) 

-0.131 

(-0.155;-0.108) 
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0.018 (0.000; 0.045) 0.000 (0.000; 0.045) 0.091 (0.012; 0.207) 0.891 (0.765; 0.960) 









Table 11: CD4 data. Estimated LDO-dependent parameters in the longitudinal data model for 
lqHMM+LDO at different quantiles. 95% bootstrap confidence intervals are reported within 
brackets. 


lqHMM+LDO lqmHMM 


r = 0.25 


bi 

—0.849 

(-1.568; 

-0.802) 

-0.740 

(-1.079; 

-0.662) 

b 2 

-0.434 

(-0.447; 

-0.401) 

-0.300 

(-0.324; 

-0.256) 

h 

-0.220 

(-0.245; 

-0.203) 

-0.164 

(-0.181; 

-0.133) 

bi 

-0.123 

(-0.141; 

-0.099) 

-0.053 

(-0.095; 

-0.035) 


-0.020 

(-0.041; 

-0.004) 

0.026 

(-0.011; 0.045) 

T = 

0.50 






h 

-0.502 

(-0.617; 

-0.370) 

-0.497 

(-0.667; 

-0.452) 

b 2 

-0.175 

(-0.204; 

-0.158) 

-0.176 

(-0.200; 

-0.155) 

b3 

-0.071 

(-0.104; 

-0.061) 

-0.070 

(-0.098; 

-0.056) 

bi 

0.026 

(-0.027; 0.037) 

0.033 

(-0.023; 0.047) 

T = 

0.75 






bi 

-0.328 

(-0.423; 

-0.297) 

-0.327 

(-0.414; 

-0.287) 

b 2 

-0.114 

(-0.130; 

-0.093) 

-0.113 

(-0.131; 

-0.093) 

b3 

-0.001 

(-0.020; 0.020) 

0.003 

(-0.023; 0.019) 


Table 12: CD4 data. Estimated LDO class model parameters for lqHMM+LDO at different 
quantiles. 95% bootstrap confidence intervals are reported within brackets. 



[to 

r = 0.25 

= 5,G = 5] 

[to 

r = 0.50 

= 4,G = 4] 

r = 0.75 

[to = 4, G = 3] 

Aoi 

-2.385 

(-3.583;-1.383) 

-1.062 

(-2.112;-0.241) 

-0.374 (-1.388; 0.615) 

A 02 

-0.082 

(-1.159; 0.993) 

1.113 

(0.013; 2.102) 

2.739 (1.295; 4.379) 

Ao3 

1.555 

(0.514; 2.627) 

4.089 

(2.002; 5.299) 


Ao4 

3.116 

(1.926; 4.388) 





-0.174 

(-0.290;-0.059) 

-0.193 

(-0.318;-0.065) 

-0.184 (-0.324;-0.066) 
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log(count +1) 


Figures 


Figure 1: CD4 data. Response variable distribution at each time occasion. 
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log(count +1) 


Figure 2: CD4 data. Longitudinal trajectories within LDO classes for IqHMM+LDO at differ¬ 
ent quantiles 


t = 0.25 
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Figure 3: Simulation study. A = (1, 2.75,—0.3). Distribution of the adjusted rand index for 
IqHMM+LDO (light grey) and IqmHMM (dark grey) in the different simulation scenarios at 
different quantiles 


x = 0.25 x = 0.50 x = 0.75 



Scenarios Scenarios Scenarios 


Figure 4: Simulation study. A = (5, 8.5,—1.1). Distribution of the adjusted rand index for 
IqHMM+LDO (light grey) and IqmHMM (dark grey) in the different simulation scenarios at 
different quantiles 


x = 0.25 x = 0.50 x = 0.75 
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